Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn $\epsilon$-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound $\mathcal{O}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ on the required number of realizations to learn these strategies with high probability, where $H$ is the length of the game, $A_{\mathcal{X}}$ and $B_{\mathcal{Y}}$ are the total number of actions for the two players. We also propose two Follow the Regularize leader (FTRL) algorithms for this setting: Balanced-FTRL which matches this lower bound, but requires the knowledge of the information set structure beforehand to define the regularization; and Adaptive-FTRL which needs $\mathcal{O}(H^2(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ plays without this requirement by progressively adapting the regularization to the observations.
translated by 谷歌翻译
模型 - 不可知的元增强学习需要估算价值函数的黑森斯矩阵。这是从实施角度挑战,反复区分政策梯度估计可能导致偏见的Hessian估计。在这项工作中,我们提供了一个统一的框架,用于估算价值函数的高阶导数,基于禁止策略评估。我们的框架将许多现有方法解释为特殊情况,并阐明了Hessian估计的偏差和方差权衡。该框架还打开了一个新的估计系列的大门,这可以通过自动差异化库轻松实现,并在实践中导致性能提升。
translated by 谷歌翻译
Federated learning is a collaborative model training method by iterating model updates at multiple clients and aggregation of the updates at a central server. Device and statistical heterogeneity of the participating clients cause performance degradation so that an appropriate weight should be assigned per client in the server's aggregation phase. This paper employs deep unfolding to learn the weights that adapt to the heterogeneity, which gives the model with high accuracy on uniform test data. The results of numerical experiments indicate the high performance of the proposed method and the interpretable behavior of the learned weights.
translated by 谷歌翻译
We consider task allocation for multi-object transport using a multi-robot system, in which each robot selects one object among multiple objects with different and unknown weights. The existing centralized methods assume the number of robots and tasks to be fixed, which is inapplicable to scenarios that differ from the learning environment. Meanwhile, the existing distributed methods limit the minimum number of robots and tasks to a constant value, making them applicable to various numbers of robots and tasks. However, they cannot transport an object whose weight exceeds the load capacity of robots observing the object. To make it applicable to various numbers of robots and objects with different and unknown weights, we propose a framework using multi-agent reinforcement learning for task allocation. First, we introduce a structured policy model consisting of 1) predesigned dynamic task priorities with global communication and 2) a neural network-based distributed policy model that determines the timing for coordination. The distributed policy builds consensus on the high-priority object under local observations and selects cooperative or independent actions. Then, the policy is optimized by multi-agent reinforcement learning through trial and error. This structured policy of local learning and global communication makes our framework applicable to various numbers of robots and objects with different and unknown weights, as demonstrated by numerical simulations.
translated by 谷歌翻译
我们建议基于负担能力识别和一种神经远期模型的组合来预测负担执行的效果的新型动作序列计划。通过对预测期货进行负担能力识别,我们避免依赖多步计划的明确负担效果定义。由于该系统从经验数据中学习负担能力效果,因此该系统不仅可以预见到负担的规范效应,还可以预见到特定情况的副作用。这使系统能够避免由于这种非规范效应而避免计划故障,并可以利用非规范效应来实现给定目标。我们在一组需要考虑规范和非典型负担效应的测试任务上评估了模拟系统的系统。
translated by 谷歌翻译
在边缘计算中,抑制数据大小是执行复杂任务(例如自动驾驶)的机器学习模型的挑战,其中计算资源(速度,内存大小和功率)受到限制。通过将其分解为整数和真实矩阵的乘积,已经引入了矩阵数据的有效损耗压缩。但是,它的优化很困难,因为它需要同时优化整数和真实变量。在本文中,我们通过利用最近开发的黑盒优化(BBO)算法来改善这种优化,并具有用于整数变量的ISING求解器。此外,该算法可用于解决分别在真实和整数变量方面线性和非线性的混合成员编程问题。讨论了ISINS求解器的选择(模拟退火,量子退火和模拟淬火)与BBO算法(BOCS,FMQA及其变化)的策略之间的差异,以进一步开发BBO技术。
translated by 谷歌翻译
二次无约束的二进制优化(QUBO)求解器可以应用于设计最佳结构以避免共振。在经典或量子设备上使用的QUBO算法在某些工业应用中取得了成功。但是,由于难以从原始优化问题转变为QUBO,它们的应用仍受到限制。最近,已经提出了黑盒优化(BBO)方法,可以使用机器学习技术和贝叶斯治疗来解决此问题,以进行组合优化。我们采用了BBO方法来设计印刷电路板以避免共振。该设计问题是为了最大程度地提高固有频率并同时最大程度地减少安装点的数量。固有频率是QUBO公式的瓶颈,在BBO方法中近似于二次模型。我们证明,使用分解机的BBO在计算时间和找到最佳解决方案的成功概率中都表现出良好的性能。我们的结果可以打开Qubo求解器在结构设计中的其他应用的潜力。
translated by 谷歌翻译